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Abstract: 

We study a simple heteropolymer model containing sequence-independent local interactions on both 
square and triangular lattices. Sticking to a two-letter code, we investigate the model for varying 
strength k of the local interactions; n — corresponds to the well-known HP model [K.F. Lau and 
K.A. Dill, Macromolecules 22, 3986 (1989)]. By exhaustive enumerations for short chains, we obtain 
all structures which act as a unique and pronounced energy minimum for at least one sequence. We 
find that the number of such designable structures depends strongly on n. Also, we find that the 
number of designable structures can differ widely for the two lattices at a given k. This is the 
case, for example, at K = 0, which implies that the HP model exhibits different behavior on the 
two lattices. Our findings clearly show that sequence-independent local properties of the chains can 
play an important role in the formation of unique minimum energy structures. 
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1 Introduction 



Natural proteins fold into unique compact structures in spite of the huge number of possible confor- 
mations [0. It is widely believed that for most single domain proteins, the native structure is the 
global free-energy minimum , but the mechanism that determines the structure is still not under- 
stood. It is also not known whether the property of having a unique native structure is common or 
rare among random polypeptides. It is therefore tempting to find out under which conditions and 
to what extent unique native structures appear in simple heteropolymer models. 

In recent years there has been an increasing interest in simple statistical-mechanical models for 
protein folding ||. Most of the models that have been studied are lattice-based with contact inter- 
actions only. Such models focus on heterogeneity as the primary force which drives the formation of 
unique native structures, and it has been found that the degree of heterogeneity plays an important 
role. For example, for the cubic lattice, it has been shown that it is possible to design twenty-letter 
sequences that have unique native structures, whereas two-letter sequences with this property seem 
to be rare j|, ||. However, a limitation of these models is that local interactions are neglected, and 
such interactions might be important not only for the local structure of the chains, as illustrated 
by a recent study of a simple three-dimensional off-lattice model [^). This model, with a simple 
two-letter code, was studied both with and without sequence-independent local interactions. In the 
presence of these interactions, it was shown that it is possible to find sequences that have compact, 
well-defined native structures. Without the local interactions, no such sequences were found. In 
this paper we investigate the effects of local interactions in some detail in two-dimensional lattice 
models using a two-letter code. 

Our starting point is the HP model of Lau and Dill , where the monomers are either hydrophobic 
(H) or hydrophilic/polar (P). This model contains no explicit local interactions, but the underlying 
lattice can be thought of in terms of local interactions. In order to study the influence of the lattice, 
we compare the behavior of the model on the square and triangular lattices. Our calculations are 
based on exhaustive enumerations of the full conformational space, for chains containing up to 
eighteen monomers on the square lattice and up to thirteen monomers on the triangular lattice. 
The triangular lattice has the advantage over the more widely used square lattice that it does not 
exhibit the well-known even-odd problem — on the square lattice it is impossible for two monomers 
at an even distance along the sequence to form a nearest-neighbor contact. 

The properties of the HP model on the square lattice are known in some detail from previous 
studies In particular, it has been shown that about 2% of all HP sequences possess non- 
degenerate ground states on this lattice |)[ It turns out that such sequences are much more 
rare on the triangular lattice. There is, for example, no 13-mer with non-degenerate ground state 
on this lattice. 

Having seen this, we turn to a simple extension of the HP model which contains explicit sequence- 
independent local interactions. We study this model for varying strength k of the local interactions, 
focusing on the set of all ground states that are non-degenerate and separated from the rest of 
the states by a sufficiently large energy gap. Sequences having such ground states can design the 
corresponding structures, and the number of sequences that can design a given structure will be 
called the designability of this structure |ll| . Every structure that can be designed by at least one 
sequence will be called designable. 



1 



We find that the number of designable structures is strongly K-dependent, and that it can differ 
widely for the two lattices at a given k. The difference is particularly large at k = 0, which 
corresponds to the HP model. However, the maximum numbers of designable structures for the two 
lattices are comparable, and for both lattices there is a pronounced peak in the number of designable 
structures at k — —0.5. At this k the typical designable structure is compact with many turns. 
Focusing on maximally compact structures, we study the designability of individual structures at 
this k. We find that the designability tends to be much higher on the square lattice. On this lattice 
we find that there are certain compact structures which can be designed by a very large number of 
sequences. This finding is in line with the results of Li et al. fll|| . However, it is important to note 
that the results for the triangular lattice are different, which indicates that the emergence of such 
structures to some extent is related to the even-odd problem. 

Our results clearly show that sequence-independent local interactions can play an important role 
in the formation of unique minimum energy structures. Although our study is confined to two 
dimensions, we expect this to hold in three dimensions as well. In fact, one may expect such 
interactions to be even more important in three dimensions, where the flexibility of the chains is 
greater. 



2 Methods 



The chains studied are linear and self-avoiding, and contain two monomer types, H and P. A sequence 
is specified by a choice of monomer types at each position on the chain, {<7i}, where <r, takes the 
values H and P and i is a monomer index. A structure is specified by a set of coordinates for all the 
monomers, {x^}, and the bend angle formed by sites Xj_i, x^ and Xj+i will be denoted by 9i. The 
energy of a structure is given by sequence-independent local interactions and sequence-dependent 
nearest-neighbor contact interactions, 

E = kE l +E g (1) 

N-l 

E L = 2^(l-cos^) (2) 
E G = e^A^-x,) (3) 

l<i<j<N 

where A(x^ — Xj) = 1 if x.; and x.j are nearest neighbours on the lattice but i and j are not adjacent 
positions along the sequence, and A(x^— x 3 ) = otherwise. The energy depends on three parameters 
e HH , e HP and e PP which will be held fixed throughout the paper. Following Lau and Dill we take 

e HH = -1 e HP = e PP = (4) 

The remaining parameter k determines the strength of the local interactions. For n — the model is 
identical to the HP model, the energy being given by minus the number of topological HH contacts 
(two monomers i and j are in topological contact if A(x^ — Xj) = 1). 

In our calculations we focus on the energy spectrum. For a given number of monomers, N, we 
compute this for all the 2 N possible sequences, by numerical enumeration of the full conformational 
space. In this way we determine all sequences having a ground state which is non-degenerate and 
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separated from the rest of the spectrum by a sufficiently large energy gap. We say that such a 
sequence can design its ground state structure, and each structure that can be designed by at least 
one sequence will be called designable. The number of designable structures will be denoted by Dn 
for chains with N monomers. 

These definitions involve a parameter AE; the gap between the ground state and the next lowest 
level is required to be greater than or equal to AE. The choice of this parameter is somewhat 
arbitrary. We tested several different values and decided to use AE — 1, corresponding to the 
energy of one HH contact. Small changes of AE leads to qualitatively similar results. 

The gap criterion is important when studying general k. To illustrate this, let us consider N = 13 
chains on the triangular lattice. Here one finds that all ground states are degenerate at k — 0, while 
4328 of the sequences have non-degenerate ground states at small, positive k. However, each ground 
state becomes effectively degenerate for sufficiently small K, since all the gaps vanish in this limit. 

Our choice of gap criterion implies that we look for ground states corresponding to a single structure 
on the lattice. For longer chains it would probably be more relevant to consider the gap between 
the ground state and the lowest of all states with little structural similarity to the ground state [ fl2"| , 
|l3| , |l4| . In principle, it would be more appropriate to formulate the gap criterion in terms of a 
normalized gap AE/E K , E K being a K-dependent energy scale. However, the variation of E K can, 
for our purposes, be neglected in the n region that is of primary interest (small and moderate |k|). 

The total number of sequences that can design structures will be denoted by Sn for chains with 
N monomers. In general Sn is greater than Dn, since different sequences may have the same 
ground state structure. The difference between Sjy and Djq is particularly large in the trivial limit 
k — > oo. In this limit, there is one structure, the rod-like structure which minimizes £x, which can 
be designed by all sequences, so Sn is equal to the size of the full sequence space while D^ = 1. 
In the limit k — > — oo the situation is similar but slightly different. On the triangular lattice there 
is again one structure, a zig-zag pattern, which can be designed by all sequences. This structure is 
the unique maximum of El- On the square lattice there are, by contrast, many different structures 
that maximize El- 



3 Results 



3.1 The number of designable structures 
3.1.1 k = 

We first consider the model in absence of the local interactions (k = 0). Energy gaps can then 
take integer values only, so, with our choice of AE, the gap criterion is met by all sequences with 
non-degenerate ground states. Hence, Sn is here simply the number of ./V-mers with non-degenerate 
ground states. In Table |l| we show the quantities Sn and Dn for different N on the square and 
triangular lattices. Our results for Sn on the square lattice can be compared with those of Chan and 
Dill g, and are consistent with these. Also shown in Table [l] are the total numbers of different 
conformations, unrelated by simple symmetries, for different N. 



3 





square 


triangular 




No. of 






No. of 






N 


conformations 


Sn 


D N 


conformations 


Sn 


D N 


3 


2 








3 


2 


1 


4 


5 


4 


1 


12 


2 


2 


5 


13 








52 


1 


1 


6 


36 


7 


3 


228 








7 


98 


10 


2 


996 








8 


272 


7 


5 


4324 








9 


740 


6 


4 


18678 








10 


2034 


6 


4 


80345 


2 


2 


11 


5513 


62 


14 


344431 


6 


6 


12 


15037 


87 


25 


1472412 


2 


2 


13 


40617 


173 


52 


6279601 








14 


110188 


386 


130 








15 


296806 


857 


218 








16 


802075 


1539 


456 








17 


2155667 


3404 


787 








18 


5808335 


6349 


1475 









Table 1: Sn and D n for different N for the HP model (k — 0) on the square and triangular lattices. 
Also shown are the total numbers of conformations for different N. 



From Table |l| it can be seen that Sn increases roughly linearly with the total number of sequences 
on the square lattice; the fraction of sequences having non-degenerate ground states varies between 
2.1 and 2.6% for 12 < N < 18. Over the same N range the number of designable structures 
satisfies 0.235jv < Dn < 0.34SV, so on average each designable structure can be designed by 2.9- 
4.3 sequences. These results contrast sharply with those for the triangular lattice, where Sn and 
Dn are much smaller. Also, there is no structure on the triangular lattice that can be designed by 
more than one sequence for 4 < N < 13. 



3.1.2 k^O 



We now turn to general k. In Fig. [j] we show the k dependence of Dn for N = 13. The large-|K| 
behavior of Dn is trivial, as discussed above. However, as can be seen from Fig. [j], there is a 
small-|K| region where Dn shows an interesting and strong k dependence. For both lattices there 
is a sharp peak at k = —0.5. For the square lattice there is another, slightly higher peak at k = 0, 
corresponding to the HP model. Such a peak is missing for the triangular lattice, which leads to 
the big difference in the results for the HP model. 

In order to study the character of the designable structures, we computed average values of the 
total number of topological contacts, C, which is a measure of compactness, and El (see Eq. ||). 
In Fig. ^ these quantities are plotted against k, using N = 13. Each data point in this figure is an 
average over all the designable structures at a given k. For N = 13, the maximum value of C is 
6 on the square lattice and 14 on the triangular lattice, whereas the maximum value of El is 22 
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Figure 1: The number of designable structures, Dpj, versus k for N — 13, (a) square and (b) 
triangular lattice. 

on the square lattice and 33 on the triangular lattice. From Fig. || it can be seen that C and El 
vary rapidly with n. Both quantities are large at the peaks at k = —0.5, showing that the typical 
designable structure at this k is compact with many turns. 



A closer look at the designable structures at k = —0.5 shows that roughly half of them are maximally 
compact, in the sense that they have maximum C; this holds for 19 out of 48 structures on the 
square lattice, and for 50 out of 98 structures on the triangular lattice. An example of a designable 
structure at k = —0.5 which is not maximally compact is the zig-zag structure that maximizes El 
on the triangular lattice. A structure such as this is less interesting than the maximally compact 
ones from the viewpoint of design. In our study below of the designability of individual structures, 
we focus on maximally compact structures. 



Local properties of the designable structures, such as El, are strongly K-dependent. To illustrate 
this, we show in Fig. [5] two designable structures on the triangular lattice corresponding to k = —0.5 
and k = 0.5, respectively. Both these conformations are maximally compact. Another important 
property which they share is that they display strong regularities in the local structure, reminiscent 
of the secondary structure in real proteins. The two conformations differ markedly, however, in the 
precise form of the local structure. 



3.2 The designability of individual structures 



So far we have classified the structures in a binary way as either designable or not. It is also 
interesting to see to what extent those structures that are designable differ in designability, the 
designability being defined as the number of sequences that can design a given structure. 

Recently, Li et al. |ll| studied the designability of individual structures in a HP-like model on the 
square and cubic lattices. These calculations were performed using the restricted conformational 
space consisting of all maximally compact structures, and the energy function was given by Eqs. 1-3 
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Figure 2: Averages of C (the number of topological contacts) and El (see Eq. g|) over the designable 
structures at different k for N = 13: (a) C, square lattice; (b) C, triangular lattice; (c) El, square 
lattice; (d) El, triangular lattice. For the triangular lattice, there arc k values at which no designable 
structures were found (see Fig. nh. The lines connecting the data points are dotted in this region. 



with e H H = —2.3, e HP = — 1 and e PP = k = 0. Large variations in designability were observed. In 
particular, it was shown that certain structures can be designed by a huge number of sequences. 

In order to test the generality of these findings, we have performed analogous calculations for our 
model using n = —0.5 and N = 13. The designability was computed for each of the maximally 
compact structures, as discussed above. Let us stress, however, that, in contrast to Li et al. [ pd| , we 
base our definition of designability upon the full set of all possible structures rather than the subset 
of maximally compact structures. 

Comparing our results for the square and triangular lattices, we find that the designability tends to 
be much higher on the square lattice. The average designability is 82.1 for this lattice, compared to 
4.5 for the triangular lattice. The highest designabilities we measured are 590 and 12, respectively, 
for the square and triangular lattices. The variations in designability are, as in the model studied 
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by Li et al. [Tl} |, very large on the square lattice. 

The observed differences in designability for the two lattices are striking, but may be not surprising 
in view of the even-odd problem. The fact that certain contacts cannot be formed on the square 
lattice tends, for a given structure, to increase the degree of degeneracy with respect to the sequence 
degrees of freedom. 



4 Summary and discussion 



We have studied ground state properties of a simple heteropolymer model containing sequence- 
independent local interactions. By enumeration of the full conformational space for short chains, we 
calculated the degeneracy of the ground state and the gap to the next lowest level for all possible 
sequences. In this way we obtained all structures that are designable. Our results show that 
the number of designable structures depends strongly on the strength n of the local interactions. 
Furthermore, we have seen that the behavior on the square and triangular lattices can be very 
different at a given k. An example of this is the HP model (k = 0). As our study of non-zero 
k shows, the difference in behavior of the HP model on the two lattices can to a certain degree 
be compensated for by introducing the local interactions. However, there seems to be at least one 
important difference between the properties on the two lattices. Our study of the designability 
of maximally compact structures shows that this tends to be much higher on the square lattice. 
Furthermore, on this lattice we find that, as in the model studied by Li et al. there are certain 
compact structures that can be designed by a huge number of sequences. No such structures were 
found on the triangular lattice. 

The effects of local interactions were recently studied in a simple three-dimensional off-lattice model 
with two- letter code ||. Our findings are in good agreement with the results of this study, but 
may at sight seem to contradict the results obtained by Govindarajan and Goldstein p5| . It should 
therefore be stressed that the system studied by Govindarajan and Goldstein is very different from 
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ours. They studied the behavior of optimized sequences in a model with a much higher degree of 
heterogeneity, in which the local interactions are sequence-dependent. 



Starting with the work of Flory |16| , there have been a number of studies of homopolymer models 
containing local interactions similar to those we have considered here. From the viewpoint of protein 
folding, the model studied by Kolinski et al. JIt], |l8) appears particularly interesting. This model 
contains, in addition to the local interactions, also attractive nearest-neighbor contact interactions. 
Its phase diagram for k > exhibits coil and globule phases, as well as a folded low-temperature 



phase 1 19, BOl. The behavior of this model for k < has to our knowledge not been investigated. 



The HP model, without local interactions, has recently been utilized by Camacho and Schanke 
in order to study the role of crosslinks in polymers. There have been several recent studies of this 
important issue based on Gaussian models [^2[ Camacho and Schanke took a different 

approach and examined the zero temperature limit of the HP model, where the surviving structures 
are those that have the maximum number of HH links. Their study was carried out using the 
square lattice, and it would be interesting to see to what extent the results remain unchanged on 
the triangular lattice. The behavior of the HP model on the triangular lattice has been studied 
previously in other contexts, recently by Seno et al. pq]. 
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